Exploration of distributional models for a novel intensity-dependent normalization procedure in censored gene expression data

نویسندگان

  • Nicola Lama
  • Patrizia Boracchi
  • Elia Biganzoli
چکیده

Currently used gene intensity-dependent normalization methods, based on regression smoothing techniques, usually approach the two problems of location bias detrending and data re-scaling without taking into account the censoring characteristic of certain gene expressions produced by experiment measurement constraints or by previous normalization steps. Moreover, the bias vs variance balance control of normalization procedures is not often discussed but left to the user’s experience. Here an approximate maximum likelihood procedure to fit a model smoothing the dependences of log-fold gene expression differences on average gene intensities is presented. Central tendency and scaling factor were modeled by means of B-splines smoothing technique. Alternatively to the outliers theory and robust methods, our approach is to look for suitable distributional models, possibly generalizing the classical Gaussian and Laplacian assumption, addressing the problem of censoring. An Information Criterion (AIC) and the Bayesian Information Criterion (BIC) were adopted for model selection. A Monte Carlo evaluation was performed in order to investigate the goodness of fit for the selected models. Randomization quantiles are used to produce normal distributed adjusted data. The analysis was performed on a pre-processed publicly available dataset with censored gene expression data, published in a Breast Cancer microarray study. Results obtained from the different models, suggest that Asymmetric Laplace distribution produce the best fit models. AIC and BIC information criteria advocate models with different flexibility levels for the various arrays; BIC showed tendency to produce more parsimonious best-fitting models. Comparison of model based generated data to observed microarray data indicated reasonable fits for the models evaluated. The proposed approach provides a way to model the distribution of gene expression data as a function of the mean intensity value, controlling for different type of censoring. Information criteria could help avoid the potential systematic distortion caused by a poor bias vs variance balance control. Laplace distribution should be considered in the future parametric error modeling research studies. The proposed approach provides a way to model the distribution of gene expression data as a function of the mean intensity value, controlling for different type of censoring. Information criteria could help avoid the potential systematic distortion caused by a poor bias vs variance balance control. Laplace distribution should be considered in the future parametric error modeling research studies.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Effect of Eight Weeks of High-Intensity Interval Training on Cardiac PARP-1 Gene Expression in Methamphetamine-Dependent Male Rats

Introduction: One of the markers that change due to deoxyribonucleic acid (DNA) damage and the production of free radicals in the poly (ADP-ribose) Polymerase-1 (PARP-1) gene, which plays an essential role in inflammation, apoptosis, and necrosis. This study aimed to evaluate the effect of eight weeks of high-intensity interval training on cardiac PARP-1 gene expression in methamphetaminedepend...

متن کامل

Normalization of qPCR array data: a novel method based on procrustes superimposition

MicroRNAs (miRNAs) are short, endogenous non-coding RNAs that function as guide molecules to regulate transcription of their target messenger RNAs. Several methods including low-density qPCR arrays are being increasingly used to profile the expression of these molecules in a variety of different biological conditions. Reliable analysis of expression profiles demands removal of technical variati...

متن کامل

ADAM Gene Expression in The Adult CNS and Genetic Aberrations in Cancer Cells

ADAM metalloprotease-disintegrins share a common modular structure of functional domains for proteolytic, cell adhesion, and signaling interactions. The metalloprotease domain of oughly half of the known ADAMs contain an intact consensus metzincin catalytic site, and they are thus thought to function as active metalloproteases. The types of interactions mediated by ADAMs are expressly conspicu...

متن کامل

Model Selection Based on Tracking Interval Under Unified Hybrid Censored Samples

The aim of statistical modeling is to identify the model that most closely approximates the underlying process. Akaike information criterion (AIC) is commonly used for model selection but the precise value of AIC has no direct interpretation. In this paper we use a normalization of a difference of Akaike criteria in comparing between the two rival models under unified hybrid cens...

متن کامل

Development of A Novel Gene Expression System for Secretory Production of Heterologous Proteins via the General Secretory (Sec) Pathway in Corynebacterium glutamicum

Background: Corynebacterium glutamicum (C. glutamicum) is a potential host for the secretory production of the heterologous proteins. However, to this date few secretion-type gene expression systems in C. glutamicum have been developed, which limit applications of C. glutamicum in a secretory production of the heterologous proteins.Objectives: In this stu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Computational Statistics & Data Analysis

دوره 53  شماره 

صفحات  -

تاریخ انتشار 2009